Choosing non-redundant representative subsets of protein sequence data sets using submodular optimization

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

UniqueProt: creating representative protein sequence sets

UniqueProt is a practical and easy to use web service designed to create representative, unbiased data sets of protein sequences. The largest possible representative sets are found through a simple greedy algorithm using the HSSP-value to establish sequence similarity. UniqueProt is not a real clustering program in the sense that the 'representatives' are not at the centres of well-defined clus...

متن کامل

Complexity of choosing subsets from color sets

We raise and investigate the algorithmic complexity of the following problem. Given a graph G = (V; E) and p-element sets L(v) for its vertices v 2 V such that jL(u) L(v)j p + r for all edges uv 2 E, do there exist q-element subsets C (v) L(v) with C (u) \ C (v) = ; for all uv 2 E ? Here p; q; r are positive integers, p q and p + r 2q. We characterize precisely which triples (p; q; r) admit a p...

متن کامل

Eliminating redundancy among protein sequences using submodular optimization

Motivation: Submodular optimization, a discrete analogue to continuous convex optimization, has been used with great success in many fields but is not yet widely used in biology. We apply submodular optimization to the problem of removing redundancy in protein sequence data sets. This is a common step in many bioinformatics and structural biology workflows, including creation of non-redundant t...

متن کامل

Selection of representative protein data sets.

The Protein Data Bank currently contains about 600 data sets of three-dimensional protein coordinates determined by X-ray crystallography or NMR. There is considerable redundancy in the data base, as many protein pairs are identical or very similar in sequence. However, statistical analyses of protein sequence-structure relations require nonredundant data. We have developed two algorithms to ex...

متن کامل

Mining Non-Redundant Sets of Generalizing Patterns from Sequence Databases

Sequential pattern mining techniques extract patterns corresponding to frequent subsequences from a sequence database. A practical limitation of these techniques is that they overload the user with too many patterns. Local Process Model (LPM) mining is an alternative approach coming from the field of process mining. While in traditional sequential pattern mining, a pattern describes one subsequ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proteins: Structure, Function, and Bioinformatics

سال: 2018

ISSN: 0887-3585

DOI: 10.1002/prot.25461